Mining Long High Utility Itemsets in Transaction Databases

نویسندگان

  • GUANGZHU YU
  • SHIHUANG SHAO
  • DAOQING SUN
  • BIN LUO
چکیده

Although support has been used as a fundamental measure to determine the statistical importance of an itemset, it can’t express other richer information such as quantity sold, unit profit, or other numerical attributes. To overcome the shortcoming, utility is used to measure the semantic importance and several algorithms for utility mining have been proposed. However, existing algorithms for utility mining adopt an Apriori-like candidate set generation-and-test approach,and are inadequate on databases with long patterns. To solve the problem, this paper proposes a hybrid model and a novel algorithm, i.e., inter-transaction, to discover high utility itemsets from two directions: existing algorithms such as UMining [1] seeks short high utility itemsets from bottom, while inter-transaction seeks long high utility itemsets from top. To avoid the costly process of extending short itemsets step by step, inter-transaction find long itemsets directly by intersecting relevant transactions. Experiments on synthetic data show that the new algorithm achieves high performance, especially in high dimension data set. Key-Words: utility; long high utility itemset; intersection transaction; partition; hybrid model

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for High Average-utility Itemset Mining

High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...

متن کامل

Mining High Utility Itemsets from Large Transactions using Efficient Tree Structure

Mining high utility itemsets from a transactional database refers to the discovery of itemsets with high utility like profits. It is an extension of the frequent pattern mining. Although a number of relevant algorithms have been proposed in recent years, they incur the problem of producing a large number of candidate itemsets for high utility itemsets. Such a large number of candidate itemsets ...

متن کامل

High Utility Itemset Mining

Data Mining can be defined as an activity that extracts some new nontrivial information contained in large databases. Traditional data mining techniques have focused largely on detecting the statistical correlations between the items that are more frequent in the transaction databases. Also termed as frequent itemset mining , these techniques were based on the rationale that itemsets which appe...

متن کامل

A Fuzzy Algorithm for Mining High Utility Rare Itemsets – FHURI

Classical frequent itemset mining identifies frequent itemsets in transaction databases using only frequency of item occurrences, without considering utility of items. In many real world situations, utility of itemsets are based upon user’s perspective such as cost, profit or revenue and are of significant importance. Utility mining considers using utility factors in data mining tasks. Utility-...

متن کامل

A Hybrid Method for High-Utility Itemsets Mining in Large High-Dimensional Data

Existing algorithms for high-utility itemsets mining are column enumeration based, adopting an Apriorilike candidate set generation-and-test approach, and thus are inadequate in datasets with high dimensions or long patterns. To solve the problem, this paper proposed a hybrid model and a row enumerationbased algorithm, i.e., Inter-transaction, to discover high-utility itemsets from two directio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007